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PREFACE 
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under  the  supervision  of  Mr.  M.  Crowell,  Jr.,  Director,  Research  Institute. 

COL  Daniel  L.  Lycan,  and  COL  Edward  K.  Wintz,  CE  were  Commanders 
and  Directors  and  Mr.  Robert  P.  Macchia  was  Technical  Director  of  the  Engineer 
Topographic  Laboratories  during  the  study  and  report  preparation. 
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A  STUDY  OF  THE  HUMAN  VISUAL  SYSTEM 
IN  SUPPORT  OF  AUTOMATED  FEATURE  EXTRACTION 


INTRODUCTION 

The  purpose  of  this  study  is  to  analyze  the  architecture  and  functions  of  the 
human  visual  system  and  to  apply  the  resuits  to  the  problem  of  automated  feature 
extraction.  Most  tasks  performed  by  man  require  processing  of  visual  input.  Two 
examples  are  sorting  letters  and  interpreting  aerial  photographs.  In  both  cases,  a 
hugh  amount  of  material,  letters,  or  images  has  to  be  processed.  If  the  tasks  are  to  be 
performed  by  machines,  visual  feature  extraction  must  be  carried  out  automatically. 
So  far,  the  success  of  automation  has  been  limited.  The  study  of  the  human  visual 
system  and  its  unmatched  capability  may  provide  new  leads  to  the  problem  of  auto¬ 
mated  feature  extraction  and  machine  intelligence. 

The  analysis  of  the  architecture  and  functions  of  the  human  visual  system  is 
based  on  an  in-depth  examination  and  evaluation  of  literature  concerning  the  anatomy, 
neurology,  and  sensory  perception  of  the  human  visual  system.  In  recent  years,  new 
anatomical  staining  methods,  radioactive  tracing  techniques,  and  measurements  of 
neural  responses  using  microelectrodes  have  greatly  contributed  to  a  better  under¬ 
standing  of  the  human  visual  system. 

A  brief  description  of  the  anatomy  of  the  visual  system  is  presented.  The  in¬ 
formation-processing  functions  of  the  components  of  the  visual  system  are  analyzed 
and  discussed.  The  architecture  of  the  visual  system  is  compared  with  the  architecture 
of  a  computer.  Some  ideas  of  translating  functions  of  the  visual  system  into  artificial 
intelligence  are  investigated. 


ANATOMICAL  ARCHITECTURE  OF  THE  VISUAL  SYSTEM 

Block  Diagram  of  the  Visual  System  •  The  visual  system  consists  of  the 
following  major  components:  retina,  optic  nerve,  optic  chiasma,  optic  tract,  lateral 
geniculate  nucleus,  visual  radiation,  and  visual  cortex.  In  figure  1 ,  a  schematic  diagram 
of  the  visual  system  is  shown.  The  optical  image  that  is  projected  onto  the  retina  is 
converted  by  the  retina  into  a  pattern  of  neural  signals.  The  signals  are  transmitted 
to  the  lateral  geniculate  nucleus  by  nerve  fibers  that  originate  in  the  retina.  These 
fibers  form  the  optic  nerve,  the  chiasma,  and  the  optic  tract.  The  lateral  geniculate 
nucleus,  visual  radiation,  and  visual  cortex  are  integral  parts  of  the  brain.  The  visual 
systems  of  all  mammals,  including  man,  are  organized  in  the  same  manner.  Differences 
between  species  are  seen  in  the  complexity  and  detailed  structure  of  the  major  com¬ 
ponents.  The  functional  building  blocks  of  the  visual  systems  are  neurons  (nerve  cells). 
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Neurons  •  A  neuron  consists  of  a  cell  body,  dendrites,  and  an  axon.  The 
cell  body  is  compact  and  usually  spherical.  The  dendrites  are  extensions  that  branch 
out  and  form  tree-like  clusters  around  the  cell  body.  The  axon  is  an  extension  much 
longer  than  the  dendrites  and  tends  away  from  the  cell  body  (see  figure  2). 


The  dendrites  receive  incoming  signals,  the  cell  body  integrates  the  signals, 
and  the  axon  transports  the  outgoing  signal  to  the  axon  terminals,  which  distribute 
the  information  to  a  new  set  of  neurons.  Information  is  transferred  from  one  neuron 
to  another  at  specialized  points,  which  are  called  synapses.  At  a  synapse,  an  axon 
terminal  is  separated  from  the  end  of  a  dendrite  only  by  a  very  narrow  gap.  The 
transfer  of  information  from  the  axon  to  the  dendrite  is  accomplished  by  chemical 
transmitters.  Chemical  transmitters  are  the  substances  that  are  ejected  by  the  axon 
into  the  synoptic  gap,  diffuse  across  the  gap,  and  are  received  by  the  dendrite.  The 
reception  of  transmitter  molecules  produces  an  electrical  signal  in  the  surface  mem¬ 
brane  of  the  dendrite,  which  is  conducted  to  the  cell  body.  The  signals  arriving  from 
the  various  dendrites  are  combined  by  the  cell  body  into  a  frequency  coded  pulse 
signal  that  is  transmitted  along  the  axon  to  other  synapses.  The  pulse  amplitude  is 
for  all  neurons  about  70  millivolts,  the  pulse  duration  is  about  1  millisecond,  and  the 
frequency  or  firing  rate  does  not  exceed  approximately  800  pulses  per  second.  The 
electrical  impulses  are  generated  and  transmitted  by  a  process  within  the  surface  mem¬ 
brane  that  is  called  the  potassium-sodium  ion  pump. 


There  are  two  types  of  synapses.  The  first  type  is  called  the  excitatory  synapse 
and  it  increases  the  firing  rate  of  the  neurons  upon  receiving  signals  across  the  synapses. 
The  second  type  is  called  the  inhibitory  synapse  and  it  reduces  the  firing  rate  in  the 
follow-on  neuron.  Most  neurons  receive  input  through  many  synapses  (up  to  several 
thousands),  some  of  which  are  excitatory  and  some  inhibitory.  The  sum  of  the  ex¬ 
citatory  and  inhibitory  effects  will  determine  the  firing  rate  of  the  neuron.  Because  the 
flow  of  information  within  a  neuron  is  always  from  .the  dendrite  terminals  to  the 
axon  terminals,  the  neuron  may  be  considered  as  a  directional  information  processing 
element. 


Retina  •  The  retina  is  a  layer  of  nerve  cells  of  approximately  spherical 
shape.  It  is  approximately  one-tenth  of  a  millimeter  thick  and  comprises  an  area  of 
approximately  9  square  centimeters.  The  retina  is  composed  of  three  major  layers 
of  cells,  receptors,  bipolars,  and  ganglion  cells,  with  lateral  cross-connections  accom¬ 
plished  by  amacrine  and  horizontal  cells.  The  peripheral  cells  of  the  first  layer  are 
photo-receptors  -  cones  and  rods.  The  center  of  the  retina  is  called  the  fovea.  It 
is  roughly  circular,  and  covers  approximately  2  square  millimeters.  The  number  of 
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J.  cones  per  square  millimeter  is  approximately  7000  over  most  of  the  retina,  except 

C  in  the  area  within  2  mm  from  the  center  in  which  the  fovea  is  located.  The  number 

of  cones  decreases  rapidly  from  the  center. 

I 

.  The  central  fovea  has  a  diameter  of  0.5  mm  and  contains  about  20,000  cones. 

The  overall  fovea  has  a  diameter  of  1.6  mm  and  contains  about  100,000  cones. 
i:  The  cones  at  the  center  of  the  fovea  are  longer  and  thinner  than  those  at  the  periphery. 

|  The  distance  between  the  centers  of  two  adjacent  foveal  cones  is  between  2.0  and 

v  .  !  2.5  pm  (micrometers).  The  total  number  of  cones  is  approximately  6.5  million. 

■  (  There  are  no  rods  at  the  central  fovea,  but  the  rods  increase  away  from  the  fovea  to 

reach  a  maximum  of  160,000  per  square  millimeter  at  a  distance  of  5  to  6  mm 
j  from  the  center.  In  the  more  peripheral  areas,  the  number  of  rods  decreases,  but 

I  remains  higher  than  the  number  of  cones.  The  total  number  of  rods  is  approximately 

l  1 20  million. 

t ;  1 

r, 


Adjacent  to  the  level  of  receptors  is  a  layer  of  complex,  reciprocal  synaptic 
interconnections,  called  the  outer  plexiform  layer,  where  the  horizontal  cells  are 
found.  These  units  serve  to  interconnect  the  various  receptors  to  each  other  in  the 
lateral  dimension.  The  second,  or  inner,  nuclear  layer  consists  of  bipolar  cells  that 
transmit  incoming  signals  from  the  receptors  to  the  next  layer  of  cells,  the  amacrine 
cells.  The  latter  are  also  found  to  run  in  the  horizontal  dimension  among  bipolar 
cells  and  ganglion  cells,  which  make  up  the  third  layer  of  the  retina.  The  axons  of  the 
ganglion  cells  are  the  fibers  that  comprise  the  optic  nerve,  chiasma,  and  optic  tract  and 
send  visual  information  to  the  brain.  Thus,  information  is  transmitted  vertically  and 
horizontally  in  the  retina,  such  that  there  is  a  widespread  interaction  between  the  cells. 


The  retina  may  be  divided  into  two  parts;  the  nasal  hemi-retina  and  the  tem¬ 
poral  hemi-retina. 


Optic  Nerve,  Chiasma,  and  Optic  Tract  •  The  optic  nerve  fibers  leave  the 
eye  via  the  optic  disc,  an  area  containing  no  photoreceptors,  thus  called  the  “blind 
spot,”  at  approximately  16°  of  the  nasal  hemi-retina.  From  the  optic  disc,  the  optic 
nerve  from  each  eye  converges  and  meets  at  the  chiasma,  where  the  fibers  from  .  ach 
nasal  hemi-retina  cross  to  the  opposite  side  of  the  brain.  The  fibers  from  each  temporal 
hemi-retina  enter  the  chiasma,  but  remain  on  the  same  side  of  the  brain  when  they 
exit  the  chiasma.  Thus,  some  fibers  cross  to  the  side  opposite  of  their  origin,  others 
do  not.  The  ratio  of  crossed  to  uncrossed  fibers  is  approximately  3:2.  The  optic 
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nerve  fibers  undergo  a  rearrangement  of  position  along  the  course  of  the  nerve.  The 
most  marked  rearrangement  occurs  with  those  fibers  that  serve  central  vision.  These 
are  progressively  displaced  from  the  middle  third  of  the  temporal  side  of  the  nerve 
toward  the  center. 


The  crossed  fibers  from  the  nasal  hemi-retina  and  the  uncrossed  fibers  from  the 
temporal  hemi-retina  form  the  optic  tract  on  each  side  of  the  brain.  These  fibers 
then  proceed  to  the  lateral  geniculate  nucleus  on  each  side  of  the  brain.  The  number 
of  fibers  of  the  optic  nerve  for  several  vertebrates  is  shown  below: 


Man  : 

1,000,000 

Dog  : 

154,000 

Pig 

681,000 

Cat  : 

119,000 

Chicken  : 

414,000 

Frog 

29,000 

Rabbit  : 

265,000 

Hag  Fish : 

1,579 

Lateral  Geniculate  Nucleus  and  Visual  Radiation  •  The  majority  of  the  optic 
tract  fibers  terminate  in  the  lateral  geniculate  nucleus  (LGN).  The  LGN  is  the  first 
synaptic  relay  station  in  the  path  from  the  retina  to  the  visual  cortex.  The  LGN 
has  six  layers  of  neurons,  which  are  numbered  from  1  to  6.  The  neurons  of  layers 
1  and  2  are  relatively  large  ceil  bodies;  whereas,  the  neurons  of  layers  3  and  6  are 
small.  The  large,  as  well  as  the  small,  neurons  have  long  axons  that,  synapsize  with 
neurons  of  the  visual  cortex.  Each  layer  also  contains  neurons  with  short  axons  that 
make  synaptic  connections  within  the  LGN.  These  cells  are  specialized  for  lateral 
transmission  among  layers  and  provide  a  means  for  lateral  interaction  and  neural 
integration  between  cells  at  all  layers  of  the  LGN.  Fibers  from  the  central  area  of  the 
retina  terminate  predominantly  in  the  wedge-shaped  middle  part  of  the  LGN,  and 
fibers  from  the  peripheral  area  of  the  retina  terminate  in  the  outer  portions  of  the 
LGN.  Thus,  the  retina  is  mapped  with  relatively  minor  distortion  into  the  LGN. 


Each  LGN  receives  fibers  from  both  eyes.  Crossed  and  uncrossed  fibers  ofThe 
optic  tract  terminate  in  separate  layers  of  the  LGN.  Layers  1 ,  4,  and  6  receive  their 
input  from  the  eye  on  the  opposite  side  via  crossed  fibers  and  layers  2,  3,  and  5 
receive  their  input  from  the  eye  on  the  same  side  via  uncrossed  fibers.  Descending 
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frc  n  the  cortex  are  long  axons  with  many  synaptic  contacts  within  each  layer.  The  e 
axons  establish  a  neural  feedback  network  and  provide  for  signals  from  the  cortex  it? 
be  integrated  with  those  signals  coming  from  the  retina.  The  organization  of  the  LGH 
displays  the  same  kind  of  lamination  and  lateral  connection  observed  in  the  retina, 
and  the  microanatomy  reveals  a  kind  of  columnar  organization  also  found  in  the 
cortex. 


The  main  efferent  fibers  from  the  LGN  for  n  the  visual  radiations.  The  fibers 
fan  out  and  then  reassemble  in  a  compact  bundle  that  terminates  in  the  visual  cortex. 


Visual  Cortex  •  The  visual  cortex  is  a  laminated  structure  that  is  about  2 
millimeters  thick  and  covers  approximately  15  square  centimeters.  The  number  of 
neurons  in  the  visual  cortex  is  estimated  to  be  150  million.  The  neurons  are  arranged 
in  six  parallel  layers,  numbered  from  I  to  VI,  and  have  alternately  low  and  high 
neuron  density.  The  structure  of  each  layer  shows  a  marked  uniformity. 


The  prinicpal  input  to  the  visual  cortex  is  provided  by  the  visual  radiation.  The 
fibers  of  the  visual  radiation  make  synaptic  connections  with  neurons  of  layer  IV. 
Other  input  is  provided  by  fibers  of  the  corpus  callosum,  which  is  a  bundle  of  nerve 
fibers  connecting  the  two  cerebral  hemispheres.  Fibers  of  the  corpus  callosum  make 
synaptic  connections  with  neurons  of  layers  I  and  II. 

The  visual  cortex,  which  occupies  essentially  the  Brodmann  area  17  of  the 
cerebral  cortex,  is  also  connected  with  other  parts  of  the  brain  through  axons  pro¬ 
jecting  from  the  visual  cortex.  Axons  from  layer  VI  project  mainly  back  to  the  LGN. 
Axons  from  layer  V  project  into  the  superior  colliculus,  which  is  believed  to  be  a 
visual  interpretive  system.  Axons  from  layers  II  and  III  project  into  the  Brodmann 
area  18  and  19,  which  are  known  to  re-process  information  that  has  been  processed 
through  the  visual  cortex. 


There  are  two  broad  classes  of  neurons  in  the  visual  cortex;  pyramidal  neurons, 
having  extended  dendrite  fields  and  long  axonal  extensions,  and  stellate  neurons, 
having  dendrite  fields  distributed  uniformly  around  the  cell  body. 
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Most  stellate  neurons  are  found  in  the  central  layers  of  the  visual  cortex,  es¬ 
pecially  in  layer  IV.  Pyramidal  neurons  are  concentrated  in  the  layers  II,  III,  and 
VI.  The  stellate  neurons  of  layer  IV  receive  the  input  signals  from  the  visual  radi¬ 
ation,  which  are  then  carried  to  the  other  layers  by  pyramidal  neurons.  The  pyramidal 
neurons  establish  synaptic  contacts  with  stellate  neurons  and  carry  signals  vertically 
(normal  to  the  layer)  through  the  layers  of  the  visual  cortex.  Different  pyramidal  cells 
make  specific  connections  between  different  layers. 


Local  horizontal  connections  along  the  cortex  are  established  by  short  axon 
cells,  which  have  an  effective  spread  of  less  than  one-half  of  a  millimeter.  The  six  layers 
of  the  visual  cortex  are  interconnected  horizontally.  There  are  local  connections  along 
the  layers,  which  are  equivalent  to  a  connection  between  adjacent  regions  of  the  visual 
field.  The  retina  is  mapped  through  the  visual  pathway  into  the  visual  cortex.  A  signi¬ 
ficant  characteristic  of  the  mapping  is  nonlinear  distortion.  For  example,  the  central 
part  of  the  retina,  which  includes  only  3  percent  to  the  retinal  surface,  is  represented 
by  almost  half  of  the  visual  cortex.  Thus,  the  retinal  area  with  high  acuity  is  repre¬ 
sented  unproportionally  large  in  the  visual  cortex. 


INFORMATION  PROCESSING  OF  THE 
VISUAL  SYSTEM 

Effect  of  Eye  Motion  •  The  eyes  project  an  image  of  the  external  world 
onto  the  retina  and  are  constantly  in  motion.  Even  during  steady  fixation  of  a  station¬ 
ary  object,  eye  movement  occurs,  which  is  comprised  of  three  components:  flick, 
drift,  and  tremor.  The  flicks  occur  at  irregular  intervals  of  1/300  of  a  second  to  5 
seconds  and  have  an  amplitude  of  approximately  20  minutes  of  arc.  Between  flicks, 
the  eye  drifts  slowly  about  1  degree  per  second.  Superimposed  upon  these  movements 
is  a  high  frequency  tremor  of  varying  amplitudes  up  to  one-half  minute  of  arc  and  a 
frequency  of  up  to  150  per  second. 


The  eye  movement  causes  a  corresponding  motion  of  the  image  on  the  retina. 
When  the  image  of  a  pattern  is  stabilized  on  the  retina,  the  perceived  outline  and 
contrast  in  the  pattern  begin,  after  approximately  1  second,  to  deteriorate  and 
finally  to  disappear.  The  stabilization  of  an  image  on  the  retina  is  accomplished  by 
an  optical  arrangement  in  which  one  mirror  is  attached  to  the  eyeball  to  compensate 
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for  the  eye  movements.  If  the  image  is  flickered  or  caused  to  move  on  the  retina,  it 
immediately  reappears.  The  flick  movements  of  the  eye  are  most  important  for  main¬ 
taining  optimal  accuracy. 

A  basic  property  of  the  visual  system  is  its  sensitivity  to  local  contrast.  The  eye’s 
ability  to  detect  an  edge  or  contour  is  determined  by  the  change  in  brightness  of  the 
visual  field  across  that  contour.  Within  a  wide  range,  the  absolute  level  of  illumination 
is  relatively  unimportant.  The  sp.'tuu  changes  in  brightness  of  the  image  appear  to  the 
photo  receptors  of  the  retina  a t  i;i.  >'al  changes  of  brightness  because  the  involuntary 
eye  movement  produces  a  time-v°.  ying  input  to  tile  receptors.  Only  the  changes  of 
luminance  at  the  photo  receptors  in  the  retina  signal  the  pattern  of  the  retinal  image 
to  the  brain.  The  visual  neurons  of  the  retina,  LGN,  and  visual  cortex  respond  to 
changes  in  luminance,  whether  these  are  produced  in  time  by  light-on  or  light-off  or 
in  space  by  contrast  patterns  shining  upon  their  receptive  fields.  Such  stimuli  are 
created  under  natural  conditions  by  the  voluntary  and  involuntary  eye  movements 
so  that  a  continuously  changing  neural  input  from  the  eye  is  provided  to  the  visual 
brain. 


Receptive  and  Perceptive  Fields  •  The  receptive  field  concept  is  important 
in  understanding  the  visual  system.  The  receptive  field  of  an  individual  neuron  is 
defined  as  the  area  in  the  external  world  in  which  a  stimulus  elicits  activity  in  that 
neuron.  Receptive  fields  are  determined  by  electrical  recordings  from  visual  neurons, 
and  are  organized  by  exploring  localized  visual  stimuli.  Neural  signals  are  recorded 
by  coupling  a  microelectrode  to  a  neural  axon.  The  microelectrode  picks  up  the 
signals  transmitted  along  the  axon,  which  are  then  amplified,  recorded,  or  displayed. 
Because  inserting  microelectrodes  in  the  visual  system  requires  surgical  operations, 
most  of  the  recordings  have  not  been  performed  on  human  beings,  but  on  rabbits, 
cats,  and  monkeys.  Receptive  fields  are  also  organized  by  using  radioactive  tracing 
methods.  The  receptive  fields  are  found  in  the  retina,  the  lateral  geniculate  nucleus, 
and  the  visual  cortex.  In  the  same  way  as  stimuli  are  applied  to  test  animals  for  re¬ 
ceptive  field  research,  stimuli  may  be  also  applied  to  test  persons.  The  experiments 
lead  to  the  concept  of  perceptive  fields.  Perceptive  fields  are  defined  as  the  psycho- 
physiological  equivalents  of  visual  receptive  field  organization  in  man,  and  they  can  be 
estimated  by  visual  phenomena,  including  illusions. 
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Receptive  Fields  and  Information  Processing  in  the  Retina  •  The  ganglion 
cells  of  the  retina  are  the  neurons  that  determine  the  retinal  receptive  fields.  The 
structure  of  retinal  receptive  fields  shows  two  regions,  an  approximately  circular 
center  and  a  ring-shaped  surround.  If  center  and  surround  are  equally  diffusely  illu¬ 
minated,  the  cell  response  is  weak.  If  the  illumination  is  non-uniform,  containing 
e.g.  a  border  between  light  and  dark  areas,  the  cell  may  respond  strongly.  A  retinal 
ganglion  cell  responds  best  to  a  roughly  circular  spot  of  light  of  a  particular  size  in  a 
particular  part  of  the  visual  field.  The  size  is  critical  because  each  cell’s  receptive  field 
is  divided  into  an  existing  center  and  an  inhibitory  surround,  or  into  the  reverse  con¬ 
figuration. 


There  are  retinal  ganglion  cells  with  wide  dendrite  fields  and  others  with  small 
dendrite  fields.  Because  the  wide  dendrite  fields  receive  a  large  amount  of  converging 
information,  and  sm?2!  J.  "date  fields,  small  amounts  of  information,  the  feature 
extraction  operation  is  instrumental  in  distinguishing  large  objects  from  small  ones. 
The  size  of  receptive  field  centers  in  the  fovea  varies  from  20  to  60  pm  in  diameter. 
The  dimensions  of  an  entire  receptive  field,  including  center  and  surround,  varies 
between  approximately  100  and  200  pm  in  the  central  region  of  the  retina.  The 
mean  diameter  of  receptive  field  centers  increases  from  the  center  of  the  retina  to  its 
periphery.  The  receptive  field  centers  of  the  peripheral  retina  have  diameters  of  ap¬ 
proximately  1  mm.  In  figures  3  and  4,  the  schematics  are  shown  of  foveal  and 
peripheral  receptive  fields. 


The  receptive  fields  of  the  retina  are  divided  into  two  classes.  In  the  first  class, 
the  ganglion  cells  of  the  receptive  fields  are  called  on-center  cells.  They  will  increase 
their  firing  rate  when  the  center  of  the  field  is  illuminated  and  decrease  their  firing 
rate  when  the  surround  is  illuminated.  In  the  second  class,  the  response  of  the  ganglion 
cells  is  reversed.  That  is,  illumination  of  the  center  decreases  the  firing  rate,  and  illu¬ 
mination  of  the  surround  increases  the  firing  rate.  The  populations  of  ganglion  cells 
of  both  classes  are  approximately  equal  in  numbers  as  well  as  in  their  distribution 
over  the  retina.  The  ganglion  cells  of  the  second  class  are  called  off-center  cells.  On- 
center  cells  with  their  center  region  located  just  on  the  bright  side  of  a  border  will 
be  most  activated;  those  on  the  dark  side  will  be  most  inhibited. 


Greatest  retinal  activity  is  thus  associated  with  border  regions.  The  retina  is 
particularly  responsive  to  sharp  contrast  borders.  This  property  of  border  enhancement 
is  clearly  important  in  the  discrimination  of  the  shape  of  features.  The  border  contrast 
is  coded  by  a  combination  of  “darkness”  signals  from  off-center  cells  on  the  dark 
side  of  the  border  and  “brightness”  signals  from  on-centers  on  the  bright  side.  The 
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Schematics  of  Peripheral  Receptic  Field. 


fact  that  the  retina  is  coding  changes  in  brightness  rather  than  absolute  brightness  of 
each  small  segment  of  the  pattern  may  be  considered  as  a  first  step  of  signal  processing 
to  economize  the  flow  of  visual  information.  The  visual  system  might  then  interpret 
the  comparative  brightness  of  two  areas  in  terms  of  the  luminance  gradient  at  the 
border  between  them. 


The  reciprocal  system  organization  of  the  two  classes  of  receptive  fields  has 
advantages  for  information  transmission  and  perception  because  it  sends  signals  of 
opposing  visual  qualities  via  specific  channels  from  the  eye  to  the  brain  and,  hence, 
doubles  the  sensory  range  of  contrast  vision. 


In  summary,  the  receptive  fields  of  the  retina  are  less  concerned  with  evaluating 
absolute  levels  of  brightness  then  they  are  with  detecting  contrast  of  brightness  within 
receptive  fields. 


Signal  Flow  and  Processing  in  the  LGN  •  The  LGN  receives  inputs  from 
the  retinas  of  both  eyes  and  the  visual  cortex.  The  LGN  is  made  up  of  six  layers  of 
neurons  arranged  in  a  shell-like  manner.  Each  layer  receives  an  orderecf  two-dimen¬ 
sional  map  of  the  retina.  The  projection  of  the  retina  into  the  different  layers  of  the 
LGN  are  in  register,  that  is,  a  single  corresponding  point  on  the  two  retinas  projects 
to  points  on  the  six  neural  layers  of  the  LGN  that  lie  one  above  the  other.  Thus, 
projection  lines  can  be  drawn  through  the  six  layers  corresponding  to  a  single  retinal 
projection  point.  Hence,  corresponding  small  areas  of  each  retina,  approximately  the 
size  of  a  receptive  field,  are  represented  as  a  column  of  cells  in  the  LGN.  The  column 
systems  are  a  solution  to  the  problem  of  portraying  more  than  two  dimensions  on  a 
two-dimensional  surface.  There  are  two  principal  classes  of  neurons  in  the  LGN: 
(1)  relay  cells  that  pass  signals  received  from  the  retina  onwards  to  the  visual  cortex, 
and  (2)  short  axon  neurons  that  transmit  inhibitory  signals  within  the  different 
layers  of  the  LGN.  The  fibers  carrying  signals  from  the  visual  cortex  back  to  the 
LGN  influence  all  six  layers.  Thus,  signals  from  one  eye  arriving  at  the  LGN  via 
the  cortex  can  suppress  direct  neuronal  signals  arriving  in  the  LGN  from  the  other 
eye. 


The  transformation  of  visual  coding  taking  place  in  the  LGN  is  best  approached 
in  terms  of  receptive  field  organization.  LGN  fields  are  concentric  in  organization, 
with  antagonistic  center  and  surround  regions.  Like  retinal  fields  they  respond  less  to 
diffuse  illumination,  which  activates  both  center  and  surround  simultaneously,  but 


are  strongly  excited  by  spot  stimuli  that  stimulate  just  the  center  region.  The  principle 
difference  from  retinal  fields  is  the  greater  suppressive  effect  of  stimulation  of  the 
surround  on  responses  evoked  from  the  center.  LGN  cells  receive  excitatory  influence 
from  one  or  a  few  retinal  cells  and  therefore  resemble  ganglion  cell  fields  in  many 
ways,  but  they  also  receive  inhibitory  retinal  input  that  modifies  their  receptive  field 
properties.  The  modifications  seem  to  make  the  LGN  fields  more  selective  for  stimu¬ 
lus  features  than  retinal  fields.  Most  LGN  cells  reveal  increases  in  rates  of  firing  to 
stimuli  whose  diameters  are  up  to  about  half  the  area  of  the  center  of  the  receptive 
field. 


Signal  Processing  in  the  Visual  Cortex  •  Visual  information  is  carried  from 
the  lateral  geniculate  nucleus  to  the  visual  cortex  by  fibers  of  the  visual  radiation.  The 
fibers  terminate  in  the  layer  IV  of  the  visual  cortex.  The  neurons  in  layer  IV  that 
receive  the  signals  directly  from  the  lateral  geniculate  nucleus  have  circular  receptive 
fields  with  center  and  antagonistic  surround.  They  respond  best  to  spot  light  stimuli 
and  are  monocular  (respond  only  to  signals  from  one  eye).  Layer  IV  contains 
another  group  of  neurons  that  do  not  have  circular  receptive  fields.  These  fields  are 
called  “simple  receptive  fields”  and  the  neurons  are  called  “simple  cells.”  Simple 
cells  respond  to  an  optimally  oriented  line  stimuli  in  a  narrowly  defined  location. 
Like  retinal  fields,  many  simple  fields  have  antagonistic  “on”  and  “off”  areas, 
which  are  arranged  in  parallel  strips  and  not  concentrically. 


Layers  II,  III,  V,  and  VI  are  populated  by  neurons  that  are  called  “complex 
cells”  and  have  receptive  fields  that  are  called  “complex  fields.”  Complex  receptive 
fields  respond  to  moving  bar  stimuli  and  are  selective  for  the  orientation  of  the  bar. 
Complex  fields  are  larger  than  simple  fields  and  are  selective  for  the  contrast  and  width 
of  bar  stimuli. 


Layers  II  and  III  also  contain  neurons  that  have  “hypercomplex  fields.” 
Hypercomplex  fields  have  a  higher  stimulus  sensitivity  than  that  of  simple  and  complex 
fields,  and  they  select  the  orientation,  width,  contrast,  length,  and  direction  of  the 
stimulus. 


One  of  the  properties  of  the  visual  cortex  is  its  sub-organization  into  functional 
columns.  Each  receptive  field  has  a  certain  preferred  orientation.  Cells  with  the  same 
orientation  preference  are  grouped  together  into  columns.  Each  small  area  of  the 
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retina  is  represented  by  a  considerable  population  of  cortical  receptive  field.  Features 
of  a  stimulus  appear  to  be  coded  in  terms  of  differential  activity  among  functional 
subgroups  of  that  population.  Cortical  cells  generally  have  small  receptive  fields, 
especially  those  cells  representing  the  fovea.  The  retinal  position  of  a  stimulus  feature 
can  be  coded  in  terms  of  which  sub-population  of  cells  is  activated.  Contours  with 
different  orientations  will  excite  different  subgroups  of  cortical  fields,  even  though 
they  are  imaged  on  the  same  small  area  of  the  retina.  The  positions  of  the  ends  of  a 
contour  will  determine  which  group  of  hypercomplex  fields  is  excited. 


There  is  a  great  increase  between  the  retina  and  the  visual  cortex  in  the  number 
of  cells  involved  in  visual  coding.  For  every  retinal  ganglion  cell,  there  are  some  hun¬ 
dreds  of  cells  in  the  visual  cortex.  Cortical  cells  that  are  excited  from  the  same  small 
area  of  the  retina  are  selectively  responsive  to  different  orientations  of  the  contrast 
edge,  to  position  of  its  ends,  and  to  differences  in  the  visual  depth  relative  to  the 
fixation  point.  A  change  in  any  of  these  parameters  or  in  retinal  position  results  it\  a 
change  of  population  of  excited  cells. 


The  coding  of  features  of  a  stimulus  pattern  does  not  appear  to  converge  on  a 
single  or  several  nuerons,  at  least  not  in  the  cortex.  Rather  considerable  populations  of 
cortical  cells  are  involved  in  the  representation  of  a  small  area  of  the  retina.  Stimulus 
features  determine  which  subgroup  of  that  population  are  excited,  unaffected,  or 
inhibited.  The  activity  of  these  populations  is  presumably  sampled  or  integrated  by 
the  higher  visual  areas  of  the  cortex  to  provide  for  the  perception  of  complex  forms. 
Generally,  there  is  a  process  of  increasing  abstraction  of  the  visual  input  along  the 
neural  pathway  of  the  visual  system.  The  abstraction  must  be  independent  of  retinal 
position,  size,  brightness,  and  small  distortions.  It  must  take  info  account  the  recog¬ 
nition  of  shape  independent  of  contrast,  the  ability  to  abstract  outlines  of  filled  shapes, 
and  the  ability  to  consider  segment:  of  a  pattern  separately.  Virtually  all  tasks  that 
might  be  termed  “form  discrimination”  involve  visual  memory.  For  example,  the 
recognition  of  a  pattern  as  belonging  to  some  previously  learned  class,  or  the  discrimi¬ 
nation  between  two  patterns  in  learning  tasks,  both  clearly  involve  interaction  between 
visual  input  and  memory. 


Higher  Level  Processing  •  The  visual  information  is  processed  along  the 
visual  pathway  by  neurons  that  have  characteristic  receptive  fields.  The  information 
processing  neurons  of  the  retina,  LGN,  and  layer  IV  of  the  visual  cortex  have  re¬ 
ceptive  fields  of  circular  shape.  The  receptive  fields  are  like  area  elements,  and  the 


neurons  respond  to  point  features  such  as  spatial  and  temporal  contrast.  Simple, 
complex,  and  hypercomplex  neurons  of  the  cortex  respond  to  line  segments  of  specific 
orientation,  length,  width,  and  contrast.  The  receptive  fields  of  these  neurons  are  more 
complex  in  shape  and  structure  than  the  receptive  fields  of  the  retina  and  LGN.  In 
figure  5,  the  antagonistic  structure  is  shown  of  the  receptive  fields  of  the  visual 
system.  At  higher  levels  of  visual  processing,  the  neurons  may  respond  to  increasingly 
complex  features  that  eventually  will  lead  to  the  visual  perception  of  the  external 
world.  The  concept  has  not  yet  been  confirmed  by  neurological  research. 


Three  experiments  are  discussed  that  deal  with  the  concept  of  trigger  features 
and  their  perception.  Thirty  persons,  including  20  adults  and  10  children,  parti¬ 
cipated  in  the  experiments.  Most  of  the  adults  were  college  graduates.  Five  of  the 
children  were  older  and  five  were  younger  than  10  years  old. 


In  the  first  experiment,  figures  6a  through  6d  were  shown  to  all  persons 
independently  and  at  different  times.  Figures  7a  through  7d  were  shown  in  the 
second  experiment.  In  both  experiments,  each  person  was  asked  what  he  or  she  saw 
when  looking  at  one  of  the  figures.  The  answers  in  both  experiments  were  divided  into 
two  categories,  namely  “yes,  I  recognize  something”  or  “no,  I  recognize  nothing.” 
Concentric  circles  were  recognized  in  the  first  experiment  and  a  dog  in  the  second. 
The  results  of  the  two  experiments  are  shown  in  table  1 . 


In  the  third  experiment,  in  which  22  people  participated,  a  person  was  given 
a  page  to  read.  During  the  reading,  the  page  was  rotated  until  reading  became  im¬ 
possible.  Rotating  the  paper  up  to  about  60°  slowed  down  the  reading  only  slightly. 
Between  70°  and  90°,  a  marked  slow  down  was  observed.  At  approximately  100° 
to  110°,  reading  became  impossible.  However,  one  test  person  was  able  to  read 
continuously  without  being  noticeable  affected  as  the  page  was  turned  one  complete 
revolution  (360°). 


In  all  experiments,  a  feature  (concentric  circles,  dog,  or  text)  was  changed 
until  the  test  person  was  unable  to  recognize  the  feature.  A  random  rotation  of  30° 
destroyed  the  impression  of  concentric  circles,  a  relocation  of  30  percent  of  elements 
made  the  presentation  of  the  dog  unrecognizable,  and  a  rotation  of  110°  made 
reading  of  the  text  impossible. 
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TABLE  1.  Perception  Experiment 


Experiment  1 


^S.  Group 

A 

B 

C 

D 

Figure's. 

Yes 

No 

Yes 

No 

Yes 

No 

Yes 

No 

6a 

15 

4 

1 

4 

■ 

2 

3 

6b 

11 

4 

2 
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m 

5 

6c 

15 

5 

5 

5 

6d 

15 

5 

5 

5 

Experiment  2 


N.  Group 

A 

B 

C 

D 

Figures. 

Yes 

No 

Yes 

No 

Yes 

No 

Yes 

No 

7a 

15 

5 

5 

5 

■1 

7b 

15 

5 

5 

5 

7c 

n 

8 

m 

m 

3 

2 

5 

7d 

■ 

15 

5 

5 

■1 

5 

Note:  A 
B: 
C: 
D 


Adults  with  college  background. 
Adults  without  college  background. 
Children  above  ten  years  of  age. 
Children  below  ten  years  of  age. 


The  columns  show  the  number  of  yes  and  no  answers. 
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The  process  of  visual  recognition  is  apparently  analogous  to  the  process  of 
signal  detection.  Both  in  visual  recognition  and  signal  detection,  there  are  thresholds 
beyond  which  the  features  cannot  be  recognized  and  the  signals  cannot  be  detected. 
The  thresholds  in  both  processes  depend  on  the  signal-to-noise  ratio.  A  signal  that 
has  been  corrupted  by  the  transmission  channel  can  still  be  detected  by  a  matched 
filter.  A  visual  feature  that  has  been  corrupted,  as  shown  in  the  experiments,  can  still 
be  recognized.  There  appears  to  be  “feature  detectors”  at  higher  levels  of  visual 
information  processing  in  the  brain  that  show  threshold  behavior  and  that  function 
like  matched  filters. 


COMPARING  THE  VISUAL  SYSTEM 
AND  THE  COMPUTER 

Both  the  visual  system  and  the  computer  have  developed  in  an  evolutionary 
manner.  However,  the  driving  force  for  each  evolution  has  been  quite  different.  The 
successful  visual  system  has  to  solve  problems,  such  as  recognizing  danger,  food  and 
cartographic  features.  The  successful  computer  has  to  solve  mathematical  problems. 


All  computers  consist  of  an  input/output  (10)  system,  a  central  processing 
un>t  (CPU),  and  a  memory.  Large  computers  may  contain  several  million  components. 
The  human  visual  system  consists  of  a  sensor  system  containing  about  130  million 
individual  sensors  and  three  sets  of  information  processors:  retina,  LGN,  and  visual 
cortex.  These  processors  are  connected  by  long  nerve  fibers.  In  a  computer,  infor¬ 
mation  is  processed  at  a  high  speed  and  serially.  Typical  processing  times  in  computers, 
such  as  operating  gates  or  shift-registers,  are  approximately  1  microsecond  or  less. 
The  components  of  modem  computers  are  highly  reliable.  Because  of  the  serial  com¬ 
puter  architecture,  the  failure  of  one  or  few  components  may  interrupt  the  entire 
computation  process. 


The  information  processing  rate  of  the  visual  system  is  approximately  tens 
of  milliseconds,  which  is  about  1,000  to  10,000  times  slower  than  that  of  a  com¬ 
puter.  Also,  the  performance  of  neurons  is  less  reliable  than  the  performance  of  com¬ 
puter  components.  However,  the  delection  of  quite  a  few  neurons  is  unlikely  to  lead 
to  any  appreciable  difference  in  performance  of  the  visual  system. 
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A  computer  operates  on  a  short  binary  code.  The  visual  system  seems  to  rely 
on  less  precise  methods  of  signaling,  probably  adjusting  the  number  and  efficiency 
of  its  synopses  in  a  complex  way  to  adapt  its  operation  to  experience.  To  execute  a 
specific  mathematical  process,  one  must  program  the  computer.  The  visual  system  does 
not  depend  on  anything  like  a  linear  sequential  program.  Its  architecture  is  more 
likely  to  be  thousands  of  circuits  primarily  in  parallel  and  richly  interconnected.  The 
visual  system  seems  to  rely  on  a  strategy  of  relatively  hard  wired  complex  circuitry 
with  elements  working  at  low  speed. 


Human  beings  are  rather  unreliable  and  slow  in  executing  accurately  complex 
and  long  arithmetical  calculations,  which  can  be  done  by  a  computer  in  a  small  fraction 
of  a  second.  Human  beings,  however,  can  recognize  pattern  in  ways  no  contemporary 
computer  has  ever  been  programed  to  do.  To  translate  the  intelligence  of  the  human 
visual  system  into  machine  intelligence,  one  must  be  able  to  hybridize  the  predomin¬ 
antly  parallel  architecture  of  the  visual  system  with  the  high  speed  serial  architecture  of 
computers.  Relatively  short  serial  circuitry  operating  on  very  high  speed  may  offer 
trade-offs  for  large  numbers  of  components  in  parallel  circuitry. 


TECHNICAL  APPROACH  TO  VISUAL 
FEATURE  EXTRACTION 

The  human  visual  system  is  capable  of  extracting  and  recognizing  from  its 
environment  an  enormous  number  of  features,  e.g.  faces  of  human  beings,  animals, 
vegetation,  houses,  cartographic  features  on  images,  and  so  forth.  Only  about  100,000 
to  1 50,000  photoreceptors  of  the  central  region  of  the  retina  are  used  for  pattern 
recognition.  This  includes  color  vision. 

Correspondingly,  about  100,000  fibers  of  the  optic  nerve  are  involved  in  the 
transmission  of  feature  information.  Because  of  the  highly  distorted  mapping  of  the 
retina  onto  the  visual  cortex  (the  central  part  of  the  retina  has  a  representation  about 
35  times  more  detailed  than  the  peripheral  part),  the  number  of  neurons  participating 
in  the  feature  extraction  process  is  estimated  to  be  about  60  million.  The  number  of 
cartographic  features  to  be  extracted  from  images  is  very  small  compared  to  the 
number  of  all  visual  features  that  the  human  visual  system  has  to  process.  Thus,  a 
visual  system  with  the  sole  purpose  of  extracting  cartographic  features  would  require 
much  less  processing  capability,  say  only  a  tenth  of  a  percent  of  that  of  the  human 
visual  system. 
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A  machine  for  automated  cartographic  feature  extraction  that  is  designed 
according  to  the  architecture  of  the  human  visual  system  would  require  an  estimated 
100,000  components  for  information  processing.  There  are  essentially  two  types  of 
imagery  to  be  evaluated:  photographic  imagery  having  normally  a  width  of  9  inches 
(229  mm),  and  radar  imagery  having  normally  a  width  of  2  inches  (50  mm).  The 
noise  resolution  of  fine-grained  film  is  about  1  pm.  To  obtain  a  reasonable  signal- 
to-noise  ratio,  one  should  integrate  the  sensor  dimension  over  a  linear  distance  that 
is  about  10  times  the  noise  distance. 


There  are  now  linear  and  area-sensing  arrays  on  the  market  having  linear  resol¬ 
utions  of  15  to  1 00  Atm.  For  example,  there  are  sensing  arrays  with  about  1700 
sensor  elements  arranged  over  1  inch  (25  mm)  length  or  arrays  with  1024  elements 
arranged  in  a  square  of  3.2  mm  side  length.  At  40  MHz,  the  content  of  these  two 
types  of  arrays  can  be  transferred  to  a  temporary  memory  in  25  or  42  microseconds, 
respectively.  Sensing  arrays  can  be  arranged  in  bar-shaped  windows  across  the  image¬ 
carrying  film.  A  calculation  has  shown  that  it  is  possible  to  move  the  film  with  a  speed 
of  about  1  inch  per  second  over  the  window  containing  the  arrays.  The  film  could 
be  illuminated  homogeneously  by  a  strobe  light.  Strobe  light,  speed  of  film  transport, 
and  information  transfer  can  be  synchronized  by  a  high  precision  clock,  e.g.  a  rubidium 
clock.  The  information  content  of  the  sensing  arrays  can  be  transferred  into  a  set 
of  tapped  delay  lines  that  represent  the  area  of  the  window.  The  information  contained 
in  the  first  set  of  delay  lines  can  be  further  processed  for  texture  and  statistics,  and 
developed  for  contrast  and  contours.  The  contours  can  be  encoded  in  terms  of  their 
normalized  curvatures,  which  are  invariant  to  size,  location,  and  orientation.  The  coded 
signals  developed  in  this  way  have  to  be  processed  into  signals  that  represent  trigger 
features  of  specific  cartographic  features.  Once  unique  signatures  of  cartographic 
features  in  terms  of  coded  trigger  feature  are  developed,  a  reference  memory  of  these 
codes  can  be  designed,  and  incoming  signals  can  be  correlated  against  them  for  re¬ 
cognition. 


Another  approach  to  feature  extraction  is  to  derive  characteristic  signal  signa¬ 
tures  of  features  using  functional  transforms,  statistical  and  texture  analysis,  and 
other  analytical  methods.  The  signal  signatures  derived  from  a  feature  must  be  mea¬ 
surable  and  represent  the  components  of  a  feature  vector  that  is  uniquely  associated 
to  the  feature.  The  recognition  can  then  be  accomplished  by  an  autocorrelation  with 
a  replica  of  the  feature  vector  or  by  a  response  of  a  specially  designed  matched  filter. 
For  each  feature,  a  special  recognition  strategy  has  to  be  developed  that  will  include 
a  mix  of  the  various  operations. 


An  automated  visual  feature  extractor  may  conceptually  consist  of  a  sensor 
unit,  an  information  distribution  unit,  a  central  processing  and  recognition  unit,  and 
an  output  unit.  The  sensor  unit  may  include  various  types  of  sensing  arrays  and  “smart 
sensors,”  which  will  perform  screening  and  initial  classification.  The  information 
distribution  unit  will  include  a  temporary  memory  and  a  transmission  network  to 
conduct  the  signal.  The  central  processing  and  recognition  unit  may  consist  of  a 
number  of  parallel  channels.  Each  channel  is  dedicated  to  extracting  a  single  feature 
and  includes  the  necessary  signal  processing  and  recognition  circuitry.  The  output 
unit  will  provide  the  feature  extraction  information  in  a  format  as  required  by  the 
user.  The  channels  are  arranged  essentially  in  a  parallel  architecture.  This  architecture 
enables  the  extractor  to  grow  organically  from  few  channels  to  many  channels  as 
extraction  and  recognition  methods  are  developed. 


CONCLUSIONS 


It  is  concluded  that 

1 .  The  success  of  the  visual  system  is  due  to  its  architecture,  which 
prefers  parallel  processing. 

2.  The  optical  image  is  transformed  by  the  visual  system  into  a  spatial 
and  temporal  pattern  of  neural  signals. 

3.  The  neural  signals  are  processed  progressively  to  trigger  the  feature 
of  increasing  abstraction. 

4.  The  trigger  features  of  specific  visual  features,  represented  by 
spatial  and  temporal  patterns  of  neural  signals,  interact  with  the 
visual  memory. 

5.  The  visual  memory  of  human  beings  is  expanded  during  a  lifetime 
by  experience  and  visual  learning. 

6.  The  architecture  of  contemporary  computers  is  not  suited  for 
automated  feature  extraction. 
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Development  trends  for  Vary  Large  Scale  Integration  (VLSI) 
electronic  circuitry,  temporary  memories  (RAM,  ROM),  micro¬ 
processors,  and  sensing  devices  indicate  that  components  for 
automated  feature  extraction  systems  will  be  available  in  the 
future. 

The  first  step  towards  automated  feature  extraction  is  an  in-depth 
analysis  of  the  individual  feature  and  the  derivation  of  measurable, 
unique  parameters,  such  as  feature  codes  or  feature  vectors  that 
determine  the  feature  unambiguously. 

A  machine  feature  extractor  has  to  employ  a  predominantly  parallel 
processing  architecture  in  order  to  be  efficient. 
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